Visual question answering based on spatial DCTHash dynamic parameter network
نویسندگان
چکیده
منابع مشابه
Speech-Based Visual Question Answering
This paper introduces the task of speech-based visual question answering (VQA), that is, to generate an answer given an image and an associated spoken question. Our work is the first study of speechbased VQA with the intention of providing insights for applications such as speech-based virtual assistants. Two methods are studied: an end to end, deep neural network that directly uses audio wavef...
متن کاملDual Attention Network for Visual Question Answering
Visual Question Answering (VQA) is a popular research problem that involves inferring answers to natural language questions about a given visual scene. Recent neural network approaches to VQA use attention to select relevant image features based on the question. In this paper, we propose a novel Dual Attention Network (DAN) that not only attends to image features, but also to question features....
متن کاملVIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering
In this paper, we address the problem of visual question answering by proposing a novel model, called VIBIKNet. Our model is based on integrating Kernelized Convolutional Neural Networks and Long-Short Term Memory units to generate an answer given a question about an image. We prove that VIBIKNet is an optimal trade-off between accuracy and computational load, in terms of memory and time consum...
متن کاملDynamic Memory Network on Natural Language Question-Answering
Question-Answering (QA) is an important milestone for the research of artificial intelligence (AI). In this work, we explore the application of memory-based neural network model to reading-comprehension type QA tasks. Based on the idea of dynamic network model (DMN) [1], we re-implement both unsupervised and supervised DMN, establishing an baselin on bAbi QA dataset, and applying to a more comp...
متن کاملFVQA: Fact-based Visual Question Answering
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural language processing communities, not least because it offers insight into the relationships between two important sources of information. Current datasets, and the models built upon them, have focused on questions which are answerable by direct analysis of the question and image alone. The set of su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: SCIENTIA SINICA Informationis
سال: 2017
ISSN: 1674-7267
DOI: 10.1360/n112016-00288